2 research outputs found
Localized convolutional neural networks for geospatial wind forecasting
Convolutional Neural Networks (CNN) possess many positive qualities when it
comes to spatial raster data. Translation invariance enables CNNs to detect
features regardless of their position in the scene. However, in some domains,
like geospatial, not all locations are exactly equal. In this work, we propose
localized convolutional neural networks that enable convolutional architectures
to learn local features in addition to the global ones. We investigate their
instantiations in the form of learnable inputs, local weights, and a more
general form. They can be added to any convolutional layers, easily end-to-end
trained, introduce minimal additional complexity, and let CNNs retain most of
their benefits to the extent that they are needed. In this work we address
spatio-temporal prediction: test the effectiveness of our methods on a
synthetic benchmark dataset and tackle three real-world wind prediction
datasets. For one of them, we propose a method to spatially order the unordered
data. We compare the recent state-of-the-art spatio-temporal prediction models
on the same data. Models that use convolutional layers can be and are extended
with our localizations. In all these cases our extensions improve the results,
and thus often the state-of-the-art. We share all the code at a public
repository
Optimization Techniques for Hestenes-Jacobi SVD on FPGAs
Matrix decomposition, such as the Singular Value Decomposition (SVD) is an important compute-intensive task in a wide variety of fields, from radar and simulation to image processing and compression. In light of growing data sizes, accelerators, such as FPGAs, are often considered for SVD, with the goal of increasing the compute efficiency. However, to achieve high-performance computation of SVD, we need high parallelism. For that, a thorough reevaluation of the complexities involved in implementing the algorithm is necessary in light of hardware and algorithm advances. In this work, we investigate the Hestenes-Jacobi SVD (HJSVD) on FPGAs. Our findings show that the Hestenes-Jacobi method, while highly parallelizable, can become constrained in its hardware resource costs and requires careful tuning to achieve high throughput with input matrix sizes and degree of parallelism having an important effect on efficient pipelining of the architecture. We identify the key challenges in parallelizing the algorithm for modern workloads and incorporate three key optimizations: pipelining, the use of fixed-point arithmetic instead of floating-point, and the use of heterogeneous resources for vector rotations, not only DSPs